HDFS-15865 Interrupt DataStreamer thread if no ack#2728
HDFS-15865 Interrupt DataStreamer thread if no ack#2728mukul1987 merged 3 commits intoapache:trunkfrom
Conversation
|
@karthikhw , the patch generally looks good to me, However what are the cases when the nodes is null ? |
|
@mukul1987 Not exactly found when nodes come null but appears when client couldn't reachable datenode (during middle of its write). Looks the next retry comes with null. |
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java
Outdated
Show resolved
Hide resolved
| newScope("waitForAckedSeqno")) { | ||
| LOG.debug("{} waiting for ack for: {}", this, seqno); | ||
| int dnodes = nodes != null ? nodes.length : 3; | ||
| int writeTimeout = dfsClient.getDatanodeWriteTimeout(dnodes); |
There was a problem hiding this comment.
This timeout is very long. For a 3 node pipeline, it will be 8 minutes + 3 * 5 seconds (for the extension).
I'm not sure I have a better suggestion for the timeout.
One question - I believe we saw this problem in a Hung Hive Server 2 process. Do we know how this problem causes the entire HS2 instance to get hung? I would have thought this issue would block the closing of a single file on HDFS and other files open within the same client could still progress as normal?
There was a problem hiding this comment.
Not much troubleshooting in Hive @sodonnel. It looks whole HS2 instance was hung. It didn't accept any new connections.
There was a problem hiding this comment.
Thanks - I guess we can go ahead with this change, but even with it, the HS2 may well get hung for 8+ minutes. Its hard to know for sure without knowing why this problem caused the whole instance to hang.
|
💔 -1 overall
This message was automatically generated. |
|
Thanks for the review @sodonnel , merging this. |
(cherry picked from commit bd3da73)
(cherry picked from commit bd3da73) Change-Id: I6604bb34e01b2e13ee4a8aafc54dc5126850fd00
No description provided.